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i!!^™' 1 "" 1 ' • ' hC divcr!ii!v ° r chemica! synthesis and the 
nulhod for d. .. C scrcemng. A process of alternating p ralld 

7t r rs SVn<hC f " " SCd io C " COdc »'°iv«J»al mLbcrs 
«r a large hbrarv of chemicals *ith uninue nucleotide se- 
ances. After the chemical entity is bound to * target! the 

TZi! 1 TuK aR,P ' mCd by rCp,kation a,,d u «2«« for 
32 of n ,he b0 " nd r ™ ,ccu '<* by serial hybridization to a 
1 . ! 1,brarv - Fhc of the chemical structure 

hound to the receptor is decoded by sequencing the nucleotide 



. Th.rc is an increasing need :o find new molecules (hat can 

\;^u!'T' ™ iulM ?: a widc ™nge of biological processes, for 
... Picons in medicine and agriculture. A standard way to 
starch for novel chemicals is to screen collections of natural 
matcna.s. s,:ch as fermentation broths, plant extracts, or 
francs of synthesized molecules. Assays can range in 
complexity from simple binding reactions to elaborate phyv 

ufK Prcparatic " s - [ hc scrcc ™ o<«cn only provide leads, 
un.eh then require further improvement either by empirical 
memoes o. ny ehcmicai design. The process is time- 

> onsum;ng and costly but is unlikely to be replaced totally bv 
rational methods even when they are based on detailed 
.novvkdec or the three-dimensional structure of the target 
macules. Thus, what we might call "irrational drug de- 

> en -the process of selecting the correct molecules from 
n^nt K n .K mPi f' ; ° r rc P cnoir «- squires continual improve- 
o?«leSon n ' C eCnCra " 0n ° f re P ertoir « and in the methods 

Rc:cr.:!y there have beer, several developments in using 
peptides or nucleotides to provide libraries of compounds for 
discovery of leads. The methods were originally developed to 

-Lm UP ,-^^ e T 1,na,i0n ° f epi, °P €S cognized by mono- 
clonal antibodies. For example, the standard serial process of 

^ri^r?^." 0, s >; n:f;c;ic P c P :idc3 has beer, replaced by a 
variety o highly sophisticated methods in which large arravs 
ol peptides are synthesized in parallel and screened with 
acceptor molecules labeled with fluorescent or other reporter 
S' 1 ; 2K T he fq»ence of any effective peptide can be 
decoded from its address in the array. In another approach 
combinatorial libraries of pep'.ides arc synthesized on resin 
oeaas such that each resin bead contains about 20 pmol of the 
same peptide (3). The beads are exposed to labeled acceptor 
molecules. Those with hound accentor are identified bv 
visual inspection :.n,l P hy,.\ : ,IK remove,!. „„<) the pcpiidr. is 
N«iwnc«l directly. In principle, this method could be used 

m r chcrr - ,cal cn,i,ics - Provided one has a sensitive- 
method Tor sequence determination. 

A different method of solving the problem of identification 
m a combinatorial peptide library is used by Houghien ct al. 
'-•). l or nexapepndes of the 20 natural amino acids, separate 
cranes are synthesized, each with the first two amino acids 
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fixed and the remaining four positions occupied by all po-.- 

bmdinoT A " 3SS1,y - based on ""petition for 

binding or some other activity, i-: then used to find the library ■ 
w.m an active peptide. On the basis of this result. 20 new 
libraries arc synthesized and assayed to determine the elTcc- 
uve : amine .acid in the third position. The process is reiterated 
in this fashion until the active hexapeptide is defined This is 
ana ogous to the method used in searching a dictionary: the 
peptide is decoded I by using a series of sieves, and this makes 
he search logarithmic. A powerful biological method has 
recently been described in which the library of peptides is 
presented on the surface of a bacteriophage such ;hat each 
Phage displays a particular peptide and contains within its 
genome the corresponding DNA sequence (5. 6). The library 
is prepared by synthesizing a repertoire of random oligonu^ 
cleotides to generate all combinations, followed by their 
insertion into a phage vector. Each of the sequences dor 

^ aee f d t th e relevant peptide can be selected o, 
finding those that t<nd to the particular target. The phages 

^r CI ?Tk Way Ca " be ■"•P'Sned.and the. selection 
repeated. The sequence of the peptide is decoded by se- 
Muenc.ng the una. Another -genetic" method has been 
applied by Tuerk and Gold (7) and Ellington and Szostak (8). 
using hbranes of synthetic oligonucleotides that themsclv s 
are selected for binding to an acceptor and then amplified by 
he polymerase chain reaction (PCR). In this case, however 
he repertoire is limited to nucleotides or nucleotide ana- 
logues that preserve specific Watson-Crick pairing and can 
be copied by a polymerase. 

The main advantages of the genetic methods reside in the 
capacity for cloning and amplification of DNA sequences 
which allows enrichment by serial selection and provides a 
facie method for deeding the structure of active molecules 

Howe ver the genetic i epertoires are restricted to nucleotides' 
and peptides composed of natural amino acids, whereas a 
more extensive chemicai repertoire is required to popular 
• entire universe of binding sites. In contrast, chemical 
methods can provide limitless repertoires, but they lack »he 
capacity for serial enrichment and there are difficulties in 
discovering the structures of selected active molecules We 
have now devised a way of combining the virtues of both 
methods through the construction of em oded vombinatvruit 
f™ libraries, in which each chemical sequence is 
labeled by an appended "genetic" tatj. itself constructed by 
chemical synthesis. In effect. w C implement a "rctrownetic" 
way of specifying each chemical structure. 

In outline, we perform two alternating parallel combina- 
torial syntheses so that the genetic tag is chemically linked to 
the chemical structure being synthesized. In each case, 
addition of a monomelic chemical unit to a polymeric struc- 
ture is followed by addition or an oligonucleotide sequence ■ 
wh,ch is defined as "encoding" that chemical unit. The 
library is built up by the repetition of this process after 
pooling and division. Active molecules arc selected by bind- 
ing to a receptor, and amplified copies or their rctroccnclr 
tags arc obtained by the PCR. DNA strands with the nppr 
pnalc polarity can then be used to enrich for a stihsct of 
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library by hybridization wi-.n the matching tags, and the 
process can then be repeated on this subset. Thus serial 
enrichment is achieved by a process of purification, exploit- 
ing linkage «c a nucleotide sequence that can be amplified. 
Finally, the structures of the chemical entities are decoded by 
cloning and sequencing the products of PCR. 

Design of <he Code and the Genetic Tag 

It is essential to choose a coding representation in such a way 
that no significant part of the sequence can occur by chance 
in some other unrelated combination. Suppose we allocate a 
triplet to each of the chemical units used. Then, because the 
method allows »k tn rover all combinations and permutations 
of an alphabet of chemical units, unless we are careful, we 
could find that two different combinations have closely 
related sequences which differ only by a frame shift and 
which could not be easily distinguished by hybridization. 
This, potentially the greatest source of errors, can be elim- 
inated by choosing a commaless code (9). The particular 
commaless triplt ; code that we have chosen allows 20 unique 
representations, as shown in Table 1. 

The sequences for the PCR primers must be chosen so that 
they do not occur within any coding segment and so that they 
can be readily removed from the final PCR product because 
we do hot want them to dominate the selective hybridization. 
This car. be achieved by building in sites for restriction 
enzymes with ihe appropriate polarity of cutting. One of the 
restriction enzymes should cut at a site that permits the 
incorporation of a biotinylated nucleotide, such as biolinyJ- 
dUTP, into the stiand complementary to the coding strand. 

Ali of the above conditions have been met in the following 
design: 

5'* A3CTACTTCCCA&GG (coding sequence] GGGCCCTATTCTTAG-3' 
V-TCGftTG A AG GGTT CCIanticooing s!rand(cCCGGGATAAGAATC-5' 
Sry\ Apa\ 

After cleavage with both restriction en.-ymes we have 

5-AGCTACTTCC CA&GG [coding sequence) GGGCC CTATTCTTAG-3' 
J'-TCGATGAAGGGTTC Clanticodina strand I C CCGGG ATAAG AATC-3' 

The internal fragment can be cloned in an appropriate vector 
to sequence the individuals. The terminal overhang of the Sty 



Table 1. Coir.maless code used in this study 



ttt 


tct 


tat 




TTC 


tec 


tac 


tgc 


TTA 


tea 


taa 


Iga 


TTG 


teg 


tag 


tgg 


Clt 


cct 


cat 


cgt 


CTC 


ccc 


cac 


cgc 


cta 






rrn 


CTG 


ccg 


cag 


egg 


ait 


act 


aat 


agl 


ATC 


ACC 


aac 


age 


ATA 


ACA 


aaa 


aga 


ATG 


ACG 


aag 


agg 


gtt 


get 


gat 


ggt 


GTC 


GCC 


gac 


ggc 


GTA 


GCA 


GAA 


r>e>- 
ggg 


GTG 


GCG 


GAG 



"Sense tmlcts" are XYZ; nonsense triplets are xyz. 
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I site can be filled in with dCT? and biotinyWUTP (BTP) 
vhich, because an asymmetric site was chosen, will ap- 
pcnJ the biotinylated nucleotides to only ne of the cleavage 
products. 

5*-AGCTACTTCCC Cfl AG Glcodiru; sequence | GGGCC CTATTCYTfcGO* 
J'*TCG.iTGAhGGGTTC BBCCJamicoding strandl£ CCGGGATAAG AATC-5' 

The biotinylated fragment can be bounu to avidin and, after 
denatiiration, provides the strand suitable for hybridization 
and selection of the appropriate coding strands: 

Avidin-BBCC[anticoding strandJC 

The two PCR primers are the two sequences 5'-AGCTACT- 
TCCCAAGG (Sty I primer) and 5'-CTAAGAATAGGGCCC 
(Apa I primer). Adding a biotin to the 5' end of the Apa 1 
primer would allow the isolation of the whole strand con- 
taining the anticoding sequence. \ v - : 
We should have at least 15 nucleotides in the coding regi n 
for effective hybridization. Thus, in a library of degree d > 
5, that is, composed of five or more successive chemical 
units, we could code each unit by a triplet. That would allow 
an alphabet (A) of up to 20 different units, each corresponding 
to one of the triplets defined above. The complexity of the 
combinatorial library is A d . Libraries with a smaller degree, 
say d -.3, should be coded by sextuplets, which, in the 
simplest case, couW be a repeated triple, {this size is chosen 
because any combination of triplets still obeys the commaless 
condition). In the same way, the size of the alphabet can be 
extended by using combinations of triplets to code for tl 
chemical units. 

A Formal Example 

As an illustration we discuss how a library of degree d = 3 is 
made with an alphabet of two amino acids, glycine and 
. methionine. In this case, we use sextuplets to give us a 
reasonable length of coding sequence. To make the se- 
quences as different as possible we code each amino acid by 
a combination of two different triplets as follows: 

Giy = CAC ATG, Met = ACGGTA 

Step I. We begin with some appropriate linker, LINK, 
attached to some solid-state surface and synthesize the first 
PCR oligonucleotide sequence on one end, in the usua! 
3'-to-5' direction, to give , v,.> 

GGGCCCTATTCTTftG-LINK 

Step 2. This product is divided into two aliquots for parallel 
synthesis. In each synthesis, one amino acid is added to 
LINK and the oligonucleotide sequence is extended by the 
corresponding code to give the following products: 

C AC ATGGGGCCCTATTCTT AG-LI NK-Gly 
ACGGTAGGGCCCTATTCTT AG-LI NK-Met 

Step J. The elongated products are pooled and again split 
into two parts for parallel synthesis, yielding 

CACATGCACATGGGGCCCTATTCTTAG-LINK-Gly-Gly 
CACATGACGGTAGGGCCCTATTCTTAG-LIWK-Net-Gly 
ACGGTACACATGGGGCCCTATTCTTAG-LINK-Gly-Het 
ACGGTAACGGTAGGGCCCTATTCTTfiG-LINK-Het-Het 

Steps 4 and S. Once more the products arc pooled and 
divided into two aliquots for parallel synthesis. This results 
in an ensemble of eight tripeptide sequences, each encoded 
by a unique sequence of 18 nucleotides. The second PCR 
oligonucleotide is added to the ensemble of products to giy/' 
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AGCT ACTTCCCA.AGGC ACATGCACriTGCflCATGGGGCCCTRTTCTTAG-LINK-Gly- Gly-Gly 
AGCTACTTCCCA AGGCAC ATGCAC ATGACGGT AGGGCCCT&TZCOLT&G-LINK-Met -Gly-Gly 
AGCTACTTCCCoAGGCACATGACGGTACACATGGGGCCCTATTCTTLCj-LIft'W-Gly-Ket-Gly 
AGCTACTTCCC A AGGC AC ATGACGGT A ACGGTAGGGCCCTA1TCTT AG-LI Nu-Ilet--Ket-Gly 
AGCTACTTCCCAAGGACGGTACACATGCACATGGGGCCCTATTCTTAG-LINK-Gly-Gly-Met 
AGCTACTTCCCAAGGACGGTACACATGACGGTAGGGCCCTS.TTCTTSG-LINK-Met-Gly-Met 
AGCTACTTCCCAAGGACGGTA ACGGTACACATGGGGCCCTATTCTXAG-Iil VK-G.ly-Met-Met 
AGCTACXTCCCA AGGACGGTA ACGGTAACGGTAGGGCCCTATTCTTAG-LINK-Met-Met-Met 



Implementation 

Although nu:ur;;l r.mino acids arc used in the example dis- 
cussed above, the system is not limited to these, nor, for that 
•natter, to peptides. The chemistry required for making 
encoded libraries is constrained only by the compatibility of 
the two alternating syntheses. Partly this involves the choice 
of the prelecting proups. and the methods used to deprotect 
one chain while :he other remains blocked. And, of course, 
e;.ch puxJuct needs to survive through the synthesis of the 
other. One can imagine many different ways of joining the 
chemical entities together, and one could even use mixed 
syntheses, provided thai the rules of mutual compatibility are 
obeyed. 

We have recently, in principle, solved the synthetic pro- 
cedures for peptides (K. Janda, S. Ramcharitar, S.B., and 
R.A.L.. unpublished results). Even within this field there is 
a choice of alphabets that extends well beyond the 20 natural 
or-amino acids. The only requirement is t^-it we be able to 
.make an amide bond. Thus, the amino and carboxylicg;oups 
can be located on a wide variety of compounds so thai we can 
make libraries with many different backbone structures. We 
can also combine different backbones, if wc define alphabets 
where, for example, both the number of carbon atoms and 
their configurations in the backbone are varied. New amino 
acids can be easily invented with unusual heterocyclic rings, 
such as thiazole-alanine or purine-alanine. These rings are 
components of natural effector molecules and often provide 
core chemical functions for important drugs. Libraries made 
with such alphabets will allow us to explore the combinatorial 
association of known effector chemical functions. 

It is also useful to consider how large the combinatorial 
library should be. The PCR provides a very sensitive detec- 
tion method, allowing even a few molecules to be seen. 
However, we heed to have some reasonable concentration of 
each of the species present to cross the binding threshold of 
the acceptor molecule being assayed. If, for example, we set 
this as 1 fiM and want 1 ml of the library, then wc need to 
make at least 1 nmol of each of the species. Libraries with 
complexities of up to 10 4 , giving us a total amount of 10 /imol 
of product, would seem reasonable. Because of this recip- 
rocal relationship, more complex libraries couid be made if 
the binding threshold is lowered. 

Discussion 

Traditional chemical synthesis proceeds by careful design, 
sequentially linking atoms or groups of at ms to a growing 
core structure. The process has the advantage that the 
product of each step can be analyzed, thereby allowing 
continuous evaluation of the effectiveness of a given strategy. 
Indeed, the analyzed results of these individual steps ulti- 
mately become the corpus of synthetic rganic chemistry. A 
major technical revolution occurred with the advent of solid- 
state methods for the synthesis of polymeric molecules (10). 
Here, since a limited number of suitably protected ligomcric 



units are addH via a common covalent bond, the results of 
the individual transformations can be predicted, and, to first 
approximation, ii is necessary to analyze only the final 
product. In addition, the relationship of the monomelic units 
to each other and the extent of conformational space that is 
occupied can be estimated. Our method permits the study of 
the efficr cy of combinatorial associations of diverse chemical 
units without the necessity of either synthesizing them one at 
a time or knowing their interactions in advance. It also allows 
easy identification of the most effective molecules through a 
common method of nucleic acid sequencing. Once the chem- 
ical polymers are decoded, more precise questions about 
critical interactions and conformations can be asked by 
reversion to classical chemical methods. Further, we expect 
that many receptors will interact with sets of related but not 
identical chemical entities such that major clues as to critical 
interactions can be deduced from the shared features of tfc 
sets. 

Our method also provides a method of amplification, again 
by exploiting a common procedure of nucleic acid hybrid- 
ization. In any screening procedure where large librari s of 
compounds or effector molecules are being studied, the 
absolute number of different nonspecific interactions may be 
large, but the specific ligand or effector is represented many 
more times than any individual background molecule. In such 
a situation the signal-to-noise ratio rapidly increases alter 
repeated cycles of amplification and selection, and the spe- 
cific molecule becomes highly enriched after only a few 
iterations. For both identification and selection, our method 
exploits the power of genetic systems. By coupling genetics 
and the versatility of organic chemical synthesis we have 
extended the range of analysis to chemicals that are. not 
themselves part of biological systems. y ^V.-'-^cv 

Wc thank Kim Janda, Bemie Giluia, and Jerry Joyce for helpful 
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